The goal of this project is to create cloropleth map of energy consumption on a per-county level. Data from the EIA gives data on utility level demand and what counties utilities deliver power to. We use data, plus Census data on population, to come up with an estimate of per-county energy consumption. This methodology will be explained below.
Here is some pseudo-code decribing the whole process, from raw data to final map output
In [93]:
import csv
# Column indicies for EIA data
UTILITY = 1
STATE = 3
COUNTY = 4
# Column indicies for EIA retail data
UTILITY_ID = 1
CONSUMPTION = 21
# Column indicies for Census data
ID = 1
DESC = 2
POP2012 = 7
# Global variables
countyToUtility = {} # Mapping from county number to a list of utilities serving it
utilityToCounty = {} # Mapping from utility id to a list of counties it serves
countyToPopulation = {} # Mapping from county number to a population from census data
nameToID = {} # Mapping from county name to the county code
utilityToConsumption = {} # Mapping the utility id to the total consumption in mWh
utilityToPopulation = {} # Mapping the utility id to the total county population it serves
countyToConsumption = {} # Mapping the final result of county ID to consumption in mWh
# Constants
YEAR = 2012
PATH_TO_RETAIL_DATA = "data/f8612012/retail_sales_%s.csv" % YEAR # Path to the defined utility service territories
PATH_TO_SERVICE_DATA = "data/f8612012/service_territory_%s.csv" % YEAR # Path to the defined utility service territories
PATH_TO_POPULATION_DATA = "data/PEP_2013_PEPANNRES/PEP_2013_PEPANNRES_with_ann_with_changes.csv"
STATES = { 'AK': 'Alaska','AL': 'Alabama','AR': 'Arkansas','AS': 'American Samoa','AZ': 'Arizona','CA': 'California','CO': 'Colorado','CT': 'Connecticut','DC': 'District of Columbia','DE': 'Delaware','FL': 'Florida','GA': 'Georgia','GU': 'Guam','HI': 'Hawaii','IA': 'Iowa','ID': 'Idaho','IL': 'Illinois','IN': 'Indiana','KS': 'Kansas','KY': 'Kentucky','LA': 'Louisiana','MA': 'Massachusetts','MD': 'Maryland','ME': 'Maine','MI': 'Michigan','MN': 'Minnesota','MO': 'Missouri','MP': 'Northern Mariana Islands','MS': 'Mississippi','MT': 'Montana','NA': 'National','NC': 'North Carolina','ND': 'North Dakota','NE': 'Nebraska','NH': 'New Hampshire','NJ': 'New Jersey','NM': 'New Mexico','NV': 'Nevada','NY': 'New York','OH': 'Ohio','OK': 'Oklahoma','OR': 'Oregon','PA': 'Pennsylvania','PR': 'Puerto Rico','RI': 'Rhode Island','SC': 'South Carolina','SD': 'South Dakota','TN': 'Tennessee','TX': 'Texas','UT': 'Utah','VA': 'Virginia','VI': 'Virgin Islands','VT': 'Vermont','WA': 'Washington','WI': 'Wisconsin','WV': 'West Virginia','WY': 'Wyoming'}
This function reads in the census data, and loads a dictionary that maps the county name to its county ID (nameToID), and another dictionary that maps from the county ID to the population estimate in 2012 (countyToPopulation). The county name is constructed as "state_countyname", all lowercase
In [94]:
def loadCensusData():
''' Loads the mapping from county number to population into
'countyToUPopulation' and a name to ID mapping from the PATH_TO_POPULATION_DATA file. '''
f = open(PATH_TO_POPULATION_DATA)
reader = csv.reader(f)
reader.next()
reader.next()
for row in reader:
# Grab the important parts of the data
id = int(row[ID])
desc = row[DESC]
pop2012 = row[POP2012]
# Derice the 'county key' from the description column
(county, state) = desc.split(',')
county = county.lower().replace("county", "").replace(".", "").replace(" ", "")
state = state.lower().replace(' ', '')
key = state + '_' + county
# correction to Lousiana county names
if state == "louisiana":
key = key.replace("parish", "")
# Setup the two mappings
nameToID[key] = id
countyToPopulation[id] = int(pop2012)
This function loads a dictionary, countyToUtility, that maps the county ID to the list of utilities that serve it (and a reverse mapping)
In [95]:
def loadCountytoUtilityData():
''' Loads the mapping from county number to utilies into
'countyToUtility' from the PATH_TO_SERVICE_DATA file. '''
f = open(PATH_TO_SERVICE_DATA)
reader = csv.reader(f)
reader.next()
for row in reader:
state = STATES[ row[STATE].upper() ].lower().replace(' ', '')
county = row[COUNTY].lower().replace(' ', '').replace('.', '')
key = state + '_' + county
utilityID = int(row[UTILITY])
try:
if nameToID[key] in countyToUtility:
countyToUtility[nameToID[key]].add(utilityID)
else:
#if key not in nameToID:
# print "key %s not found" % key
countyToUtility[nameToID[key]] = set([utilityID])
if utilityID in utilityToCounty:
utilityToCounty[utilityID].add(nameToID[key])
else:
utilityToCounty[utilityID] = set([nameToID[key]])
except Exception as e:
pass
This function loads the EIA data as a mapping from utility ID to the consumption data
In [96]:
def loadUtilityConsumptionData():
f = open(PATH_TO_RETAIL_DATA)
reader = csv.reader(f)
reader.next()
reader.next()
reader.next()
for row in reader:
id = int(row[UTILITY_ID])
consumption = int(row[CONSUMPTION].replace(',', ''))
utilityToConsumption[id] = consumption
We now define a function to map the utility ID to the sum of the populations of the counties it serves. This will help in the stimation of how much consumption each county was responsible for:
In [ ]:
def deriveUtilityPopulation():
for utility, counties in utilityToCounty.items():
totPopulation = 0
for county in counties:
totPopulation += countyToPopulation[county]
utilityToPopulation[utility] = totPopulation
Now, for the main event: lets make our estimate for the consumption estimate of each county:
In [97]:
def calculateCountyConsumption():
for county, utilities in countyToUtility.items():
try:
countyToConsumption[county] = sum([((countyToPopulation[county] / utilityToPopulation[utility]) * utilityToConsumption[utility]) for utility in utilities])
except KeyError:
countyToConsumption[county] = 0
Ok, so lets load the data and take a look at what we have:
In [98]:
# Start by loading the census data and the service territory mapping
loadCensusData()
loadCountytoUtilityData()
loadUtilityConsumptionData()
deriveUtilityPopulation()
calculateCountyConsumption()
In [99]:
# Print random sample of county name to list of utility IDs
for county, utilityList in countyToUtility.items()[0:10]:
print "%s is served by utility IDs %s" % (county, list(utilityList))
In [100]:
# Print random sample of county name to list of utility IDs
for name, countyID in nameToID.items()[0:10]:
print "%s (%s) had a population of %s" % (name, countyID, countyToPopulation[countyID])
In [101]:
# Print random sample of utility ID to consumption data
for utilityID, consumption in utilityToConsumption.items()[0:10]:
print "County ID %s consumed %s mWh" % (utilityID, consumption)
In [102]:
In [102]:
In [102]:
In [102]: